{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ijGzTHJJUCPY"
      },
      "outputs": [],
      "source": [
        "# Copyright 2024 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VEqbX8OhE8y9"
      },
      "source": [
        "# Enhance Imagen prompts with Gemini\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/prompt_generation_with_gemini.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fvision%2Fgetting-started%2Fprompt_generation_with_gemini.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n",
        "    </a>\n",
        "  </td>    \n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/vision/getting-started/prompt_generation_with_gemini.ipynb\">\n",
        "      <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/prompt_generation_with_gemini.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "</table>\n",
        "\n",
        "<div style=\"clear: both;\"></div>\n",
        "\n",
        "<b>Share to:</b>\n",
        "\n",
        "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/prompt_generation_with_gemini.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/prompt_generation_with_gemini.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/prompt_generation_with_gemini.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/prompt_generation_with_gemini.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/prompt_generation_with_gemini.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
        "</a>            \n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G1KDmM_PBAXz"
      },
      "source": [
        "| | |\n",
        "|-|-|\n",
        "|Author(s) | [Katie Nguyen](https://github.com/katiemn) |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CkHPv2myT2cx"
      },
      "source": [
        "## Overview\n",
        "\n",
        "### Gemini 2.0\n",
        "\n",
        "Gemini 2.0 is a new language model from the Gemini family. This model introduces a long context window of up to 2 million tokens that can seamlessly analyze large amounts of information. Additionally, it is multimodal with the ability to process text, images, audio, video, and code. Learn more about [Gemini 2.0](https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/).\n",
        "\n",
        "### Imagen 3\n",
        "\n",
        "Imagen 3 on Vertex AI brings Google's state of the art generative AI capabilities to application developers. Imagen 3 is Google's highest quality text-to-image model to date. It's capable of creating images with astonishing detail. Thus, developers have more control when building next-generation AI products that transform their imagination into high quality visual assets. Learn more about [Imagen on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DrkcqHrrwMAo"
      },
      "source": [
        "In this tutorial, you will learn how to use the Vertex AI SDK for Python to interact with Gemini 2.0 and Imagen 3 to:\n",
        "\n",
        "- Optimize your image generation prompts with Gemini based on a subject, action, scene, and style\n",
        "- Rewrite your prompt based on good example prompts\n",
        "- Generate enhanced prompts from an initial image"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "r11Gu7qNgx1p"
      },
      "source": [
        "## Get Started\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "No17Cw5hgx12"
      },
      "source": [
        "### Install Vertex AI SDK for Python\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tFy3H3aPgx12"
      },
      "outputs": [],
      "source": [
        "%pip install --upgrade --user --quiet google-cloud-aiplatform"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "R5Xep4W9lq-Z"
      },
      "source": [
        "### Restart runtime\n",
        "\n",
        "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
        "\n",
        "The restart might take a minute or longer. After it's restarted, continue to the next step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XRvKdaPDTznN"
      },
      "outputs": [],
      "source": [
        "import IPython\n",
        "\n",
        "app = IPython.Application.instance()\n",
        "app.kernel.do_shutdown(True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SbmM4z7FOBpM"
      },
      "source": [
        "<div class=\"alert alert-block alert-warning\">\n",
        "<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>\n",
        "</div>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dmWOrTJ3gx13"
      },
      "source": [
        "### Authenticate your notebook environment (Colab only)\n",
        "\n",
        "If you are running this notebook on Google Colab, run the following cell to authenticate your environment.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NyKGtVQjgx13"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DF4l8DTdWgPY"
      },
      "source": [
        "### Set Google Cloud project information and initialize Vertex AI SDK\n",
        "\n",
        "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
        "\n",
        "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Nqwi-5ufWp_B"
      },
      "outputs": [],
      "source": [
        "# Use the environment variable if the user doesn't provide Project ID.\n",
        "import os\n",
        "\n",
        "import vertexai\n",
        "\n",
        "# fmt: off\n",
        "PROJECT_ID = \"[your-project-id]\"  # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
        "# fmt: on\n",
        "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
        "    PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
        "\n",
        "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
        "\n",
        "vertexai.init(project=PROJECT_ID, location=LOCATION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jXHfaVS66_01"
      },
      "source": [
        "### Import libraries\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lslYAvw37JGQ"
      },
      "outputs": [],
      "source": [
        "from vertexai.generative_models import GenerativeModel, Image, Part\n",
        "from vertexai.preview.vision_models import ImageGenerationModel"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BY1nfXrqRxVX"
      },
      "source": [
        "### Load the Gemini 2.0 and Imagen 3 models\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2998506fe6d1"
      },
      "outputs": [],
      "source": [
        "model = GenerativeModel(\"gemini-2.0-flash\")\n",
        "image_generation_model = ImageGenerationModel.from_pretrained(\"imagen-3.0-generate-001\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Sr2Y3lFwKW1M"
      },
      "source": [
        "### Define helper functions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "r_38e5rRKB6s"
      },
      "outputs": [],
      "source": [
        "import typing\n",
        "\n",
        "import IPython.display\n",
        "from IPython.display import display\n",
        "from PIL import Image as PIL_Image\n",
        "from PIL import ImageOps as PIL_ImageOps\n",
        "\n",
        "def display_image(\n",
        "    image: Image,\n",
        "    max_width: int = 600,\n",
        "    max_height: int = 350,\n",
        ") -> None:\n",
        "    pil_image = typing.cast(\"PIL_Image.Image\", image._pil_image)\n",
        "    if pil_image.mode != \"RGB\":\n",
        "        # RGB is supported by all Jupyter environments (e.g. RGBA is not yet)\n",
        "        pil_image = pil_image.convert(\"RGB\")\n",
        "    image_width, image_height = pil_image.size\n",
        "    if max_width < image_width or max_height < image_height:\n",
        "        # Resize to display a smaller notebook image\n",
        "        pil_image = PIL_ImageOps.contain(pil_image, (max_width, max_height))\n",
        "    IPython.display.display(pil_image)\n",
        "\n",
        "\n",
        "def get_url_from_gcs(gcs_uri: str) -> str:\n",
        "    # Converts GCS uri to url for image display\n",
        "    url = \"https://storage.googleapis.com/\" + gcs_uri.replace(\"gs://\", \"\").replace(\n",
        "        \" \", \"%20\"\n",
        "    )\n",
        "    return url"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6ASy6PkRteoT"
      },
      "source": [
        "### Optimize your prompt\n",
        "\n",
        "Input a subject, action, and scene you would like to generate. The Gemini prompt will expand these initial ideas, but provide as much detail as possible. Choose a style from the list for your image as well."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "s2Vx2W91bl8n"
      },
      "outputs": [],
      "source": [
        "subject = \"\"  # @param {type:\"string\"}\n",
        "action = \"\"  # @param {type:\"string\"}\n",
        "scene = \"\"  # @param {type:\"string\"}\n",
        "# fmt: off\n",
        "style = \"photorealistic\"  # @param [\"photorealistic\", \"illustration\", \"digital\", \"minimalist\", \"vintage photo\", \"abstract\", \"cinematic\"]\n",
        "# fmt: on\n",
        "\n",
        "prompt = \"\"\"\n",
        "Use the provided subject, action, scene, and style\n",
        "to produce a descriptive prompt that can be used to generate an image\n",
        "with a text-to-image model.\n",
        "\n",
        "Specify where the subject is situated within the image.\n",
        "\n",
        "Indicate how the light plays a role in the image.\n",
        "\n",
        "Only return the prompt string.\n",
        "\"\"\"\n",
        "\n",
        "contents = [subject, action, scene, style, prompt]\n",
        "\n",
        "responses = model.generate_content(contents)\n",
        "display(IPython.display.Markdown(\"***Prompt:*** \" + responses.text))\n",
        "\n",
        "# generate image from Gemini written prompt\n",
        "images = image_generation_model.generate_images(\n",
        "    prompt=responses.text,\n",
        "    number_of_images=1,\n",
        "    aspect_ratio=\"1:1\",\n",
        "    safety_filter_level=\"block_few\",\n",
        "    person_generation=\"dont_allow\",\n",
        ")\n",
        "\n",
        "display_image(images[0])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WPJUbWsLjYt-"
      },
      "source": [
        "### Rewrite your prompt\n",
        "\n",
        "You can also use Gemini to rewrite an entire prompt. Enter your prompt in ```user_prompt```. You can be as detailed or as simple as you would like. This prompt enhances your initial prompt based on some provided gold standard sample prompts. You can edit the 5 example prompts below to better match your own use cases for more tailored results. See the next section for more details."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3pV8HnZqkly7"
      },
      "outputs": [],
      "source": [
        "# fmt: off\n",
        "user_prompt = \"\"  # @param {type:\"string\"}\n",
        "# fmt: on\n",
        "\n",
        "prompt = \"\"\"\n",
        "I will provide some example prompts for image generation.\n",
        "\n",
        "1. Science fiction spacesuit, white and green color scheme,\n",
        "highly detailed, intricate, full body armor, futuristic, tactical,\n",
        "dark gray background, front view, upper body, 3d render.\n",
        "\n",
        "2. A woman in profile view, gazing into the distance, against a backdrop of a vibrant,\n",
        "swirling nebula of magenta and purple hues. Superimposed over the nebula\n",
        "is a web of interconnected nodes, each housing a distinct social media icon. The centerpiece of this web is a\n",
        "banner proclaiming \"FUTURE OF MESSAGING\" in bold, futuristic font.\n",
        "\n",
        "3. A large, round, modern coffee table made of light oak wood sits in a dimly\n",
        "lit, minimalist room with dark gray walls. The table has four thick,\n",
        "cylindrical legs that are evenly spaced. An open magazine with white pages\n",
        "rests on the table's surface.  In the background, a simple chair with a round,\n",
        "wooden seat and a black metal frame is slightly out of focus. The floor is a\n",
        "medium gray concrete with subtle light reflections.\n",
        "\n",
        "4. A stylish man stands before the New York City skyline, bathed in the golden\n",
        "light of a late afternoon. He is wearing a green polo shirt with the top\n",
        "button fastened. Over the polo, he wears a light brown, unzipped bomber jacket.\n",
        "His dark brown hair is styled with a relaxed wave. He is looking down,\n",
        "a contemplative expression on his face.\n",
        "\n",
        "5. A perfume bottle with a golden cap standing on a white surface,\n",
        "hit by natural window light creating a sharp shadow on the right.\n",
        "The bottle is rectangular, made of clear glass and contains a light\n",
        "amber-colored fragrance. The background is a simple white wall.\n",
        "\n",
        "Consider these prompts. Based on these examples, rewrite the user provided\n",
        "prompt in a similar format.\n",
        "\"\"\"\n",
        "\n",
        "contents = [user_prompt, prompt]\n",
        "\n",
        "responses = model.generate_content(contents)\n",
        "display(IPython.display.Markdown(\"***Prompt:*** \" + responses.text))\n",
        "\n",
        "# generate image from Gemini written prompt\n",
        "images = image_generation_model.generate_images(\n",
        "    prompt=responses.text,\n",
        "    number_of_images=1,\n",
        "    aspect_ratio=\"1:1\",\n",
        "    safety_filter_level=\"block_few\",\n",
        "    person_generation=\"dont_allow\",\n",
        ")\n",
        "\n",
        "display_image(images[0])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A9Lj7igLCTWa"
      },
      "source": [
        "### Generate elaborate prompts from images\n",
        "\n",
        "To get more example prompts you can supply images that fit your use case and have Gemini write a prompt for the provided image. Modify ```gcs_uri``` to point to your image in Google Cloud Storage."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jMgA_aIjXZ7G"
      },
      "outputs": [],
      "source": [
        "# Load an image from Google Cloud Storage\n",
        "# fmt: off\n",
        "gcs_uri = \"gs://cloud-samples-data/generative-ai/image/dog-ad-1.png\"  # @param {type:\"string\"}\n",
        "# fmt: on\n",
        "example_image = Part.from_uri(gcs_uri, mime_type=\"image/png\")\n",
        "\n",
        "# Display image\n",
        "url = get_url_from_gcs(example_image.file_data.file_uri)\n",
        "IPython.display.Image(url, width=350)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "db6r5wB4oMnu"
      },
      "source": [
        "Run the following cell to generate a detailed prompt-like description of the previous image. This output can be copied and pasted into the Gemini prompt in the \"Rewrite your prompt\" section as a gold standard prompt. Each prompt that you add/replace in that section will help tailor the style of the rewritten prompts."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tX5kTqD6yNwl"
      },
      "outputs": [],
      "source": [
        "gemini_prompt = \"\"\"\n",
        "Based on the provided image, create a highly detailed prompt for a text-to-image model.\n",
        "Combine all attributes into a statement of the most important attributes.\n",
        "Include extra details where necessary and make sure to include details and words in the image.\n",
        "\"\"\"\n",
        "\n",
        "contents = [gemini_prompt, example_image]\n",
        "\n",
        "responses = model.generate_content(contents)\n",
        "display(IPython.display.Markdown(\"***Prompt:*** \" + responses.text))"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "prompt_generation_with_gemini.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
